Journal of Molecular Evolution — Latest Matching Preprints

1

Four classic de novo genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences

Bozorgmehr, J. H.

2023-05-30 evolutionary biology 10.1101/2023.05.28.542624 medRxiv

Top 0.1%

31.3%

Show abstract

Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged de novo "from scratch" - namely FLJ33706 in humans, Goddard in fruit flies, BSC4 in bakers yeast and AFGP2 in codfish - all have plausible evolutionary ancestors in pre-existing genes. In the case of the first two, highly diverged retrogenes that code for regulatory proteins may have been misidentified as being orphans. The antifreeze glycoproteins in cod, moreover, are shown to have likely not evolved from repetitive non-genic sequences but, as in other related cases, from an apolipoprotein that may well have been pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. An approach used here for improving homology detection draws upon similarities, not just in terms of statistical sequence analysis, but also with biochemistry and function, in order to obviate failure.

2

Δ-dN/dS: New Criteria to Distinguish among Different Selection Modes in Gene Evolution

gu, x.

2020-02-23 evolutionary biology 10.1101/2020.02.21.960450 medRxiv

Top 0.1%

26.9%

Show abstract

One of the most widely-used measures for protein evolution is the ratio of nonsynonymous distance (dN) to synonymous distance (dS). Under the assumption that synonymous substitutions in the coding region are selectively neutral, the dN/dS ratio can be used to test the adaptive evolution if dN/dS>1 statistically significantly. However, due to selective constraints imposed on amino acid sites, most encoding genes demonstrate dN/dS<1. As a result, dN/dS of a gene is less than 1, even some sites may have experienced positive selections. In this paper, we develop a new criterion, called{Delta} -dN/dS, for positive selection testing by introducing an index H, which is a relative measure of rate variation among sites. Under the context of strong purifying selection at some amino acid sites, our model predicts dN/dS=1-H for the neutral evolution, dN/dS<1-H for the nearly-neutral selection, and dN/dS>1-H for the adaptive evolution. The potential of this new method for resolving the neutral-adaptive debates has been illustrated by case studies. For over 4000 vertebrate genes, virtually all of them showed dN/dS<1-H, indicating the dominant role of the nearly-neutral selection in molecular evolution. Moreover, we calculated the dN/dS ratio for cancer somatic mutations of a human gene, specifically denoted by CN/CS. For over 4000 human genes in cancer genomics, about 55% of genes showed 1-H<CN/CS<1, about 45% of genes showed CN/CS<1, whereas less than 1% of genes showed CN/CS<1-H. Together our analysis suggested driver mutations, i.e., those initiate and facilitate carcinogenesis, confer a selective advantage on cancer cells, leading to CN/CS>1 (strong positive selection) or 1-H<CN/CS<1 (weak positive selection, combined with strong purifying selection), whereas nearly neutral selection due to reduced effective clonal size is highly unlikely in cancer evolution.

3

Protein evolution is structure dependent and non-homogeneous across the tree of life

Pandey, A.; Braun, E. L.

2020-01-29 evolutionary biology 10.1101/2020.01.28.923458 medRxiv

Top 0.1%

19.3%

Show abstract

MotivationProtein sequence evolution is a complex process that varies among-sites within proteins and across the tree of life. Comparisons of evolutionary rate matrices for specific taxa ( clade-specific models) have the potential to reveal this variation and provide information about the underlying reasons for those changes. To study changes in patterns of protein sequence evolution we estimated and compared clade-specific models in a way that acknowledged variation within proteins due to structure. ResultsClade-specific model fit was able to correctly classify proteins from four specific groups (vertebrates, plants, oomycetes, and yeasts) more than 70% of the time. This was true whether we used mixture models that incorporate relative solvent accessibility or simple models that treat sites as homogeneous. Thus, protein evolution is non-homogeneous over the tree of life. However, a small number of dimensions could explain the differences among models (for mixture models ~50% of the variance reflected relative solvent accessibility and ~25% reflected clade). Relaxed purifying selection in taxa with lower long-term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with changes in amino acid side-chain volume; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference. AvailabilityModel files are available from https://github.com/ebraun68/clade_specific_prot_models. Contactebraun68@ufl.edu Supplementary informationSupplementary data are appended to this preprint.

4

Evolution of the Cdk4/6-Cdkn2 system in invertebrates

Yuki, S.; Sasaki, S.; Yamamoto, Y.; Murakami, F.; Sakata, K.; Araki, I.

2024-09-12 evolutionary biology 10.1101/2024.05.26.595977 medRxiv

Top 0.1%

19.1%

Show abstract

The cell cycle is driven by cyclin-dependent kinases (Cdks). The decision whether the cell cycle proceeds is made during G1 phase, when Cdk4/6 functions. Cyclin-dependent kinase inhibitor 2 (Cdkn2) is a specific inhibitor of Cdk4/6, and their interaction depends on D84 in Cdkn2 and R24/31 in Cdk4/6. This knowledge is based mainly on studies in mammalian cells. Here, we comprehensively analyzed Cdk4/6 and Cdkn2 in invertebrates and found that Cdk4/6 was present in most of the investigated phyla, but the distribution of Cdkn2 was rather uneven among and within the phyla. The positive charge of R24/R31 in Cdk4/6 was conserved in all analyzed species in phyla with Cdkn2. The presence of Cdkn2 and the conservation of the positive charge were statistically correlated. We also found that Cdkn2 has been tightly linked to Fas associated factor 1 (Faf1) during evolution. We discuss potential interactions between Cdkn2 and Cdk4/6 in evolution and the possible cause of the strong conservation of the microsynteny.

5

The Use of GC-, Codon-, and Amino Acid-frequencies to Understand the Evolutionary Forces at a Genomic Scale.

Elofsson, A.

2019-12-03 evolutionary biology 10.1101/863142 medRxiv

Top 0.1%

18.9%

Show abstract

1It is well known that the GC content varies enormously between organisms; this is believed to be caused by a combination of mutational preferences and selective pressure. Within coding regions, the variation of GC is more substantial in position three and smaller in position one and two. Less well known is that this variation also has an enormous impact on the frequency of amino acids as their codons vary in GC content. For instance, the fraction of alanines in different proteomes varies from 1.1% to 16.5%. In general, the frequency of different amino acids correlates strongly with the number of codons, the GC content of these codons and the genomic GC contents. However, there are clear and systematic deviations from the expected frequencies. Some amino acids are more frequent than expected by chance, while others are less frequent. A plausible model to explain this is that there exist two different selective forces acting on the genes; First, there exists a force acting to maintain the overall GC level and secondly there exists a selective force acting on the amino acid level. Here, we use the divergence in amino acid frequency from what is expected by the GC content to analyze the selective pressure acting on codon frequencies in the three kingdoms of life. We find four major selective forces; First, the frequency of serine is lower than expected in all genomes, but most in prokaryotes. Secondly, there exist a selective pressure acting to balance positively and negatively charged amino acids, which results in a reduction of arginine and negatively charged amino acids. This results in a reduction of arginine and all the negatively charged amino acids. Thirdly, the frequency of the hydrophobic residues encoded by a T in the second codon position does not change with GC. Their frequency is lower in eukaryotes than in prokaryotes. Finally, some amino acids with unique properties, such as proline glycine and proline, are limited in their frequency variation.

6

The structural proteome for the primordial glycolysis/gluconeogenesis

do O, I. J. B.; Rego, T. G.; Jose, M.; de Farias, S. T.

2019-07-18 evolutionary biology 10.1101/706192 medRxiv

Top 0.1%

18.8%

Show abstract

Comprehending the constitution of early biological metabolism is indispensable for the understanding of the origin and evolution of life on Earth. Here, we analyzed the structural proteome before the Last Universal Common Ancestor (LUCA) based in the reconstruction of the ancestral sequences and structure for proteins involved in glycolysis/gluconeogenesis. The results are compatible with the notion that the first portions of the proteins were the areas homologous to the present-day catalytic sites. Those \"proto-proteins\" had a simple function: binding to cofactors. Upon the accretion of new elements to the structure, the catalytic function could have emerged. Also, the first structural motifs might have been related to the emergence of the different proteins that work in modern organisms.

7

Tracing Evolutionary Ages of Cancer-Driving Sites by Cancer Somatic Mutations

Gu, X.; Zhou, Z.; Yang, J.

2020-02-10 evolutionary biology 10.1101/2020.02.09.940528 medRxiv

Top 0.1%

18.5%

Show abstract

Evolutionary understanding of cancer genes may provide insights on the nature and evolution of complex life and the origin of multicellularity. In this study, we focus on the evolutionary ages of cancer-driving sites, and try to explore to what extent the amino acids of cancer-driving sites can be traced back to the most recent common ancestor (MRCA) of the gene. According to gene phylostraigraphy analysis, we use the definition of gene age (tg) by the most ancient phylogenetic position that can be traced back, in most cases based on the large-scale homology search of protein sequences. Our results are shown that the site-age profile of cancer-driving sites of TP53 is correlated with the number of cancer types the somatic mutations may affect. In general, those amino acid sites mutated in most cancer types are much ancient. These sites frequently mutated in cancerous cells are possibly responsible for carcinogenesis; some may be very important for basic growth of single-cell organisms, and others may contribute to complex cell regulation of multicellular organisms. The further cancer genomics analysis also indicates that ages of cancer-driving sites are ancient but may have a broad range in early stages of metazoans.

8

L-Shaped Distributions of the Relative Substitution Rates (c/micro) in SARS-COV-2 with or without Molecular Clocks, Challenging Mainstream Evolutionary Theories

Wu, C.; Paradis, N. J.

2024-04-30 evolutionary biology 10.1101/2024.04.29.591599 medRxiv

Top 0.1%

18.3%

Show abstract

A definitive test to quantify fitness changes of mutations is required to end a continuing 50-year "neutralist-selectionist" debate in evolutionary biology. Our previous work introduced a substitution-mutation rate ratio c/{micro} test (c: substitution rate in Translated Region/TR or UnTranslated Region/UTR; {micro}: mutation rate) to quantify the selection pressure and thus the proportions of strictly neutral, nearly neutral, beneficial, and deleterious mutations in a genome. Intriguingly, both a L-shaped probability distribution of c/{micro} and molecular clock were observed for SARS-COV-2s genome. We found that the proportion of the different mutation types from the distribution is not consistent with the hypotheses of the three existing evolution theories (Kimuras Neutral Theory/KNT, Ohtas Nearly Neutral Theory/ONNT and the Selectionist Theory/ST), and a balance condition explains the molecular clock, thus we proposed a new theory named as Near-Neutral Balanced Selectionist Theory (NNBST). In this study, the c/{micro} analysis was extended beyond the genome to 26 TRs, 12 UTRs, and 10 TRSs (Transcriptional Regulatory Sequences) of SARS-COV-2. While L-shaped probability distributions of c/{micro} were observed for all of 49 segments, molecular clocks were observed for only 24 segments, supporting NNBST and Near-Neutral Unbalanced Selectionist Theory (NNUST) to explain the molecular evolution of 24/25 segments with/without molecular clocks. Thus, the Near-Neutral Selectionist Theory (NNST) integrates traditional neutral and selectionist theories to deepen our understanding of how mutation, selection, and genetic drift influence genomic evolution. Author SummaryThe "neutralist-selectionist" debate in molecular evolution has been unresolved for 50 years due to the three main theories of molecular evolution (Kimuras Neutral Theory/KNT, Ohtas Nearly-Neutral Theory/ONNT, Selectionist Theory/ST) disagreeing on the proportion of neutral mutations (KNT), nearly-neutral deleterious mutations (ONNT) and adaptive mutations (ST) within species. We recently developed a robust method, the c/{micro} relative substitution rate test, to quantify the proportion of each mutation type within >11K genomic sequences of SARS-COV-2 RNA virus. Our previous analysis revealed an L-shaped c/{micro} probability distribution and a constant substitution rate (e.g., molecular clock) for the SARS-COV-2 genome over 19 months, and the proportions of mutation types were inconsistent with those predicted by the three theories. We thus proposed the Near-Neutral Balanced Selectionist Theory (NNBST) to explain the molecular clock-feature and L-shaped probability distribution for SARS-COV-2. In this study, we extended this analysis to the 25 protein-coding gene segments and 24 non-protein-coding segments of SARS-COV-2. We observed that all 49 segments exhibited an L-shaped probability distribution and 24 out of the 49 segments exhibited a molecular clock, however the remaining 25 segments did not exhibit a molecular clock. We thus propose the Near-Neutral Unbalanced Selectionist Theory (NNUST) and NNBST to explain the segments without/with molecular clock features, respectively. We also coin the Near-Neutral Selectionist Theory (NNST) to combine traditional KNT, ONNT and ST to deepen our understanding of how mutation, selection, and genetic drift influence genomic evolution.

9

Comparative Analysis of Drosophila Bam and Bgcn Sequences and Predicted Protein Structural Evolution

Arnce, L.; Bubnell, J.; Aquadro, C. F.

2024-12-18 molecular biology 10.1101/2024.12.17.628990 medRxiv

Top 0.1%

17.0%

Show abstract

The protein encoded by the Drosophila melanogaster gene bag of marbles (bam) plays an essential role in early gametogenesis by complexing with the gene product of benign gonial cell neoplasm (bgcn) to promote germline stem cell daughter differentiation in males and females. Here, we compared the AlphaFold2 and AlphaFold Multimer predicted structures of Bam protein and the Bam:Bgcn protein complex between D. melanogaster, D. simulans, and D. yakuba, where bam is necessary in gametogenesis to that in D. teissieri, where it is not. Despite significant sequence divergence, we find very little evidence of significant structural differences in high confidence regions of the structures across the four species. This suggests that Bam structure is unlikely to be a direct cause of its functional differences between species and that Bam may simply not be integrated in an essential manner for GSC differentiation in D. teissieri. Patterns of positive selection and significant amino acid diversification across species is consistent with the Selection, Pleiotropy, and Compensation (SPC) model, where detected selection at bam is consistent with adaptive change in one major trait followed by positively selected compensatory changes for pleiotropic effects (in this case perhaps preserving structure). In the case of bam, we suggest that the major trait could be genetic interaction with the endosymbiotic bacteria Wolbachia pipientis. Following up on detected signals of positive selection and comparative structural analysis could provide insight into the distribution of a primary adaptive change versus compensatory changes following a primary change.

10

Why is the average collateral effect of synonymous mutations so similar across alternative reading frames?

Wichmann, S.; Ardern, Z.

2022-03-25 evolutionary biology 10.1101/2022.03.22.485379 medRxiv

Top 0.1%

15.3%

Show abstract

The standard genetic code has been shown to have multiple interesting properties which impact on molecular biology and the evolutionary process. One facet of molecular biology where code structure is particularly important is the origin and evolution of overlapping genes. We have previously reported that the structure of the standard genetic code ensures that synonymous mutations in a protein coding gene will lead to a remarkably similar average "collateral" mutation effect size in at least four out of the five alternative reading frames. Here we show that only 0.26% of alternative codes with the block structure of the standard genetic code perform at least as well as the standard code in this property. Considering this finding within a code optimality framework suggests that this consistent effect size across the different frames may be adaptive. Here we give context for this finding and present a simple model where a trade-off between evolvability and robustness leads to an average mutation effect size which maximises population fitness. This supports the intuition that similar mutation effects across the different alternative reading frames may be an adaptive property of the standard genetic code which facilitates evolvability through the use of alternative reading frames.

11

Evolutionary Transcriptome Analysis Based on Differentially Expressed (DE) Genes

Gu, X.

2020-05-19 evolutionary biology 10.1101/2020.05.16.099804 medRxiv

Top 0.1%

15.1%

Show abstract

To address how gene regulation plays a key role in phenotypic innovations through high throughput transcriptomes, it is desirable to develop statistically-sound methods that enable researchers to study the pattern of transcriptome evolution. Most methods currently available are based on the Ornstein-Uhlenbeck (OU) model that considers the stabilizing selection as the baseline model of transcriptome evolution. In this paper, we developed a new evolutionary approach, based on the genome-wide p-value profile arising from statistical testing of differentially expressed (DE) genes between species. Our current approach is focused on the estimation of transcriptome distance between species. We first establish the relationship between the evolutionary model (the Markov-chain or Poisson model) and the proportion of null hypothesis (u0), which can be used to estimate the transcriptome distance. Further, we calculate the posterior probability of a gene being DE when a p-value is given, denoted by Q=P(DE|p), and develop a simple algorithm to estimate the transcriptome distance for any number of genes in the genome. Our compute simulations showed the statistical performance of these new methods are generally satisfactory.

12

Evolutionary context can clarify teleosts gene names

Gasanov, E.; Jedrychowska, J.; Kuznicki, J.; Korzh, V.

2020-02-06 genomics 10.1101/2020.02.02.931493 medRxiv

Top 0.1%

15.1%

Show abstract

The initial convention to name genes relied on historical precedent, order in the human genome or mutants in model systems. However, partial duplication of genes in teleosts required naming the duplicated genes, so ohnologs adopted the a or b extension. Rapid advances in deciphering the zebrafish genome in relation to the human genome instituted naming genes in all other fish genomes in the convention of zebrafish. Unfortunately, some ohnologs and their resembling orthologs suffered from incorrect nomenclature, which created confusion in particular instances like establishing disease models. We sought to overcome this barrier by establishing the ex silico evolutionary-based systematic approach to naming the ohnologs in teleosts and other fish. We compared gene synteny using the spotted gar genome as the reference, which represents the unduplicated ancestral state. Using new criteria, we identified several hundreds of potentially misnamed ohnologs and validated manually several ohnologs as a proof pf principle. This may help to establish a standard naming practice resulting in the improved evolutionary-based gene nomenclature. This approach may help to identify and rename ohnologs in all relevant EMBL-EBI and NCBI databases starting from the zebrafish genome to avoid further proliferation of misleading information.

13

The linear correlation between genome size and the size of the non-transcribing region

Chen, Z.-R.

2024-09-22 genomics 10.1101/2024.09.19.613789 medRxiv

Top 0.1%

13.1%

Show abstract

BackgroundThe genome sizes of organisms vary widely (C-value paradox). There are non-transcribing regions in the genome that neither encode proteins nor RNA entities. There are several hypotheses about the function of these regions: one suggests that they are unannotated functional areas, while another views them as genomic isolation zones that reduce mutations in coding regions. MethodStatistical analysis was conducted on the transcribing regions (including areas annotated as genes and transcribed pseudogenes) and non-transcribing regions, protein-coding regions (Coding sequence, CDS), and genome sizes using annotation files from 63,866 species genomes in the NCBI RefSeq database. ResultsThere is a significant linear relationship between the size of non-transcribing genomic regions and overall genome size across species, with varying proportional coefficients among different phyla (realms for viruses). As genome size increases, the proportion of non-transcribing regions gradually rises, eventually approaching a linear proportional limit, resembling one arm of hyperbolic functions. Eukaryotes show high linear correlation, with the highest in Streptophyta and the lowest in Apicomplexa. In eukaryotes, the size of the coding region increases with genome size, but the increasing trend diminishes (proportionally decreases). In non-eukaryotes, the size of the coding region maintains a linear relationship with genome size. ConclusionThe size of non-transcribing region in species may be subject to some strict quantitative control mechanism, showing that genome and non-transcribing genome sizes increase proportionally with the expansion of the transcribing genome, indicating a strict balance between expansion and energy conservation. The proportion of non-transcribed genomes in eukaryotes is conservative (although the sequences are not), and the presence of non-transcribing genomes has significant implications for the evolution or survival of species. Thus, I propose a new hypothesis about the non-transcribing genome, that it is a space for generating new genes from scratch, and the different proportional coefficients among phyla are due to their different positions in energy transfer. Graphic Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/613789v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@dc3e88org.highwire.dtl.DTLVardef@18d70e8org.highwire.dtl.DTLVardef@efb92corg.highwire.dtl.DTLVardef@66068b_HPS_FORMAT_FIGEXP M_FIG C_FIG

14

Accurate prediction of site- and amino-acid substitution rates with a mutation-selection model

Andre, I.

2024-03-06 evolutionary biology 10.1101/2024.03.02.583099 medRxiv

Top 0.1%

12.9%

Show abstract

The pattern of substitutions at sites in proteins provides invaluable information about their biophysical and functional importance and what selection pressures are acting at individual sites. Amino acid site rates are typically estimated using phenomenological models in which the sequence variability is described by rate factors that scale the overall substitution rate in a protein to sites. In this study, we demonstrate that site rates can be calculated accurately from amino acid sequences using a mutation-selection model in combination with a simple nucleotide substitution model. The method performs better than the standard phylogenetic approach on sequences generated by structure-based evolutionary dynamics simulations, robustly estimates rates for shallow multiple sequence alignments, and can be rapidly calculated also on larger sequence alignments. On natural sequences, site rates from the mutation-selection model are strongly correlated to rates calculated with the empirical Bayes methods. The model provides a link between amino acid substitution rates and equilibrium frequency distributions at sites in proteins. We show how an ensemble of equilibrium frequency vectors can be used to represent the rate variation encoded in empirical amino acid substitution matrices. This study demonstrates that a rapid and simple method can be developed from the mutation-selection model to predict substitution rates from amino acid data, complementing the standard phylogenetic approach.

15

Origin and Evolution of DNA methyltransferases (DNMT) along the tree of life: A multi-genome survey.

Bhattacharyya, M.; De, S.; Chakrabarti, S.

2020-04-09 evolutionary biology 10.1101/2020.04.09.033167 medRxiv

Top 0.1%

12.9%

Show abstract

BackgroundCytosine methylation is a common DNA modification found in most eukaryotic organisms including plants, animals, and fungi. (Cytosine-5)-DNA methyltransferases (C5-DNA MTases) belong to the DNMT family of enzymes that catalyze the transfer of a methyl group from S-adenosyl methionine (SAM) to cytosine residues of DNA. In mammals, four members of the DNMT family have been reported: DNMT1, DNMT3a, DNMT3b and DNMT3L, but only DNMT1, DNMT3a and DNMT3b possess methyltransferase activity. There have been many reports about the methylation landscape in different organisms yet there is no systematic report of how the enzyme DNA (C5) methyltransferases have evolved in different organisms. ResultDNA methyltransferases are found to be present in all three domains of life. However, significant variability has been observed in length, copy number and sequence identity when compared across kingdoms. Sequence conservation is greatly increased in invertebrates and vertebrates compared to other groups. Similarly, sequence length has been found to be increased while domain lengths remain more or less conserved. Vertebrates are also found to be associated with more conserved DNMT domains. Finally, comparison between single nucleotide polymorphisms (SNPs) prevailing in human populations and evolutionary changes in DNMT vertebrate alignment revealed that most of the SNPs were conserved in vertebrates. ConclusionThe sequences (including the catalytic domain and motifs) and structure of the DNMT enzymes have been evolved greatly from bacteria to vertebrates with a steady increase in complexity and specificity. This study provides a systematic report of the evolution of DNA methyltransferase enzyme across different lineages of tree of life.

16

Evolution and origin of sliding clamp in bacteria, archaea and eukarya

Acharya, S.; Dahal, A.; Bhattarai, H. K.

2020-10-09 molecular biology 10.1101/2020.10.09.332825 medRxiv

Top 0.1%

12.7%

Show abstract

Replication of DNA is an essential process in all domains of life. A protein often involved without exception in replication is the sliding clamp. The sliding clamp encircles the DNA and helps replicative polymerase stay attached to the replication machinery increasing the processivity of the polymerase. In eukaryotes and archaea the sliding clamp is called the Proliferating Cell Nuclear Antigen (PCNA) and consists of two domains. This PCNA forms a trimer encircling the DNA as a hexamer. In bacteria, the structure of the sliding clamp is highly conserved, but the protein itself, called beta clamp, contains three domains, which dimerize to form a hexamer. The bulk of literature touts a conservation of the structure of the sliding clamp, but fails to recognize conservation of protein sequence among sliding clamps. In this paper we have used PSI blast to the second interation in NCBI to show a statistically significant sequence homology between Pyrococcus furiosus PCNA and Kallipyga gabonensis beta clamp. The last two domains of beta clamp align with the two domains of PCNA. This homology data demonstrates that PCNA and beta clamp arose from a common ancestor. In this paper, we have further used beta clamp and PCNA sequences from diverse bacteria, archaea and eukarya to build maximum likelihood phylogenetic tree. Most, but not all, species in different domains of life harbor one sliding clamp from vertical inheritance. Some of these species that have two or more sliding clamps have acquired them from gene duplication or horizontal gene transfer events.

17

Possible changes in fidelity of DNA polymerase δ in ancestral mammals

Katoh, K.; Iwabe, N.; Miyata, T.

2020-10-30 evolutionary biology 10.1101/2020.10.29.327619 medRxiv

Top 0.1%

12.2%

Show abstract

DNA polymerase {delta} (pol{delta}) is one of the major DNA polymerases that replicate chromosomal genomes in eukaryotes. Given the essential role of this protein, its phylogenetic tree was expected to reflect the relationship between taxa, like many other essential proteins. However, the tree of the catalytic subunit of pol{delta} showed an unexpectedly strong heterogeneity among vertebrate lineages in evolutionary rate at the amino acid level, suggesting unusual amino acid substitutions specifically in the ancestral mammalian lineage. Structural and phylogenetic analyses were used to pinpoint where and when these amino acid substitutions occurred: around the 3'-5' exonuclease domain in later mammal ancestry, after the split between monotremes and therians. The 3'-5' exonuclease domain of this protein is known to have an impact on the fidelity of replication. Based on these observations, we explored the possibility that the amino acid substitutions we identified in pol{delta} affected the mutation rate of entire chromosomal genomes in this time period.

18

Determining the effects of temperature on the evolution of bacterial tRNA pools

Jain, V.; Cope, A. L.

2023-10-09 evolutionary biology 10.1101/2023.09.26.559538 medRxiv

Top 0.1%

12.0%

Show abstract

The genetic code consists of 61 codon coding for 20 amino acids. These codons are recognized by transfer RNAs (tRNA) that bind to specific codons during protein synthesis. Most organisms utilize less than all 61 possible anticodons due to base pair wobble: the ability to have a mismatch with a codon at its third nucleotide. Previous studies observed a correlation between the tRNA pool of bacteria and the temperature of their respective environments. However, it is unclear if these patterns represent biological adaptations to maintain the efficiency and accuracy of protein synthesis in different environments. A mechanistic mathematical model of mRNA translation is used to quantify the expected elongation rates and error rate for each codon based on an organisms tRNA pool. A comparative analysis across a range of bacteria that accounts for covariance due to shared ancestry is performed to quantify the impact of environmental temperature on the evolution of the tRNA pool. We find that thermophiles generally have more anticodons represented in their tRNA pool than mesophiles or psychrophiles. Based on our model, this increased diversity is expected to lead to increased missense errors. The implications of this for protein evolution in thermophiles are discussed. SignificanceProtein synthesis is a vital biological process; however, our understanding of the impact of environmental factors, such as temperature, on the evolution of the molecular mechanisms involved in protein synthesis is limited. In this study, we investigated the impact of environmental temperature on the evolution of the tRNA pool. Our analyses revealed that heat-loving bacteria (thermophiles) generally have more anticodons represented in their tRNA pool. Based on a simple model of ribosome elongation, this observed increase in tRNA diversity in thermophiles is expected to also increase the frequency of translation errors. We speculate that the increased diversity of the tRNA pool could be due to the decreased efficiency of wobble base pairing at higher temperatures, necessitating more tRNA with exact codon-anticodon pairings. Our findings provide key insights into the role of the environment in shaping the tRNA pool.

19

Return of a lost structure in the evolution of felid dentition revisited: A DevoEvo perspective on the irreversibility of evolution

Lynch, V. J.

2021-02-05 evolutionary biology 10.1101/2021.02.04.429820 medRxiv

Top 0.1%

11.7%

Show abstract

There is a longstanding interest in whether the loss of complex characters is reversible (so-called "Dollos law"). Reevolution has been suggested for numerous traits but among the first was Kurten (1963), who proposed that the presence of the second lower molar (M2) of the Eurasian lynx (Lynx lynx) was a violation of Dollos law because all other Felids lack M2. While an early and often cited example for the reevolution of a complex trait, Kurten (1963) and Werdelin (1987) used an ad hoc parsimony argument to support their proposition that M2 reevolved in Eurasian lynx. Here I revisit the evidence that M2 reevolved in Eurasian lynx using explicit parsimony and maximum likelihood models of character evolution and find strong evidence that Kurten (1963) and Werdelin (1987) were correct - M2 reevolved in Eurasian lynx. Next, I explore the developmental mechanisms which may explain this violation of Dollos law and suggest that the reevolution of lost complex traits may arise from the reevolution of cis-regulatory elements and protein-protein interactions, which have a longer half-life after silencing that protein coding genes. Finally, I present a model developmental model to explain the reevolution M2 in Eurasian lynx.

20

Whole-Proteome Tree of Arthropods: An "alignment-free" phylogeny of proteome "books"

JaeJin Choi; Byung-Ju Kim; Sung-Hou Kim

2020-07-12 evolutionary biology 10.1101/2020.07.11.198689 medRxiv

Top 0.1%

11.1%

Show abstract

An "organism tree" of insects, the largest and most species-diverse group of all living animals, can be considered as a conceptual tree to capture a simplified narrative of the complex evolutionary courses of the extant insects. Currently, the most common approach has been to construct a "protein tree", as a surrogate for the organism tree, by Multiple Sequence Alignment (MSA) of highly homologous regions of a set of select proteins to represent each organism. However, such selected regions account for a very small fraction of the whole-proteome of each organism. Information Theory provides a method of comparing two sets of all proteins, two whole-proteomes, without MSA: By treating each whole-proteome sequence as a "book" of amino acid alphabets, the information contents of two whole-proteomes can be quantitatively compared using the text comparison method of the theory, without sequence alignment, providing an opportunity to construct a "whole-proteome tree" of insects as a surrogate for an organism tree of insects. A whole-proteome tree of the insects in this study shows that: (a) all the founders of the major groups of the insects have emerged in an explosive "burst" near the root of the tree, (b) the most basal group of all the insects is a subgroup of Hemiptera consisting of aphids and psyllids, and (c) there are other notable differences in the phylogeny of the groups compared to those of the recent protein trees of insects.